MINIMAL PERTURBATIONS TO ROOTS OF PARAMETERIZED EQUATIONS 
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Abstract. The size of minimal perturbations to roots of parameterized equations can be estimated reliably from 
linearizations of the equations. 
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1. Introduction. This paper offers a systematic way to answer the question: how much 
change must occur in a solution of equations to compensate for perturbations to the equa- 
tions? Short of finding all the nearby roots of the new equations, the minimal change can 
vq . be determined in an asymptotic sense by linearizing the equations and considering the dual 

£S) ' problems. This conclusion is exhaustive because all nearby roots are considered, and strong 

because the asymptotics imply differential approximations. 

The asymptotic relationship is proved here. Companion papers make applications to 
differentiability of best approximations and to numerical analysis. 
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2. Approach. 



2.1. Introduction. Let the equations be F(y, x) = with the specific root (yo, %())■ The 
variable x is regarded as the parameter so that y depends on x constrained by F(y, x) = 0. 
The equations for y may be underdetermined so y may not be a function of x. Nevertheless, 
the size of minimal perturbations to yo is a function, 

en 

(2.1) Vf( x ) = min 1 1 2/ — 2/o 1 1 ■ 

y : F{y,x) = 

r^ , The idea is to study the value of this optimization problem by linearizing the equations. There 

f^ ■ are two requirements for the altered problems: 

l/~) • 1 . The values of the simplified problems should mimic how \i F (x) varies with x. 

2. Since n F (x) is of interest when x w Xo, good mimicry is needed near xq. 

The novelty of the present approach is to formalize these requirements by equivalence rela- 
tions, =, among functions of x; two equivalences are chosen in section 2.2. Problem (2.1) is 
then altered by linearizing F; three linearizations, F^\ are constructed in section 2.3. The 
bulk of the paper establishes equivalences fi F = n Fii) . For simplicity, the values of the 
$_i ' altered problems are written n F{i) = fa. 

2.2. Equivalence Relations. The following equivalence relation is appropriate when 
differentiability at x is the object of study. 

DEFINITION 2.1 (Differential equivalence). The functions f and g defined on a neigh- 
borhood o/xq £ R™ with values in R p are differentially equivalent at xq provided f — g has 
a Frechet derivative ofO at Xo, equivalently, 

(2.2) / =i a g ^ hm = . 

x — > x || •*■ •''Oil 

LEMMA 2.2. Differential equivalence is an equivalence relation. (This lemma is clear 
and not proved.) 
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If g is an affine function, then equation (2.2) becomes the definition for the Frechet 
derivative of / at Xq. In this way the differential properties of / at Xo are determined by the 
differential equivalence class. 

A simpler but stronger equivalence relation is that real-valued functions should be rela- 
tively closer as x approaches xq. 

DEFINITION 2.3 (Asymptotic equality). The real-valued functions f and g defined on a 
neighborhood of xq G 1" are asymptotically equal at xq provided for every e > there is a 
neighborhood N(e) of Xq such that x G N(e) implies 

(2-3) f=j a9 <=► (l-e)g(x)<f(x)<(l + e)g(x). 

LEMMA 2.4. Asymptotic equality is an equivalence relation. (This lemma is clear and 
not proved.) 

Asymptotic equality is stronger than differential equivalence. For example, all functions 
with vanishing derivatives at are differentially equivalent there, but two monomials cix ni 
and C2X™ 2 are asymptotically equal at if and only if they are equal. 

For the function p, F (x) in equation (2.1), asymptotic equality implies differential equiv- 
alence. The proof of this implication in lemma 2.6 depends on a modified implicit function 
theorem in lemma 2.5, and on the Lipschitz continuity of n F (x) at xq. 

HYPOTHESIS 2. 1 . Hypothesis 1-4 are used throughout this paper, while 5 or 6 are used 
occasionally. 

1. Norms are given for W n , E™ and W. 

2. V C R m x E™ is a neighborhood of (y , x ). 

3. F : T> — > W is continuously Frechet differentiable. 

4. F(y ,x )=Q. 

5. D x F{y , x ) : W n -4 W is onto. 

6. D 2 F(y ,x ) : E" ->• W is one-to-one. 

Lemma 2.5 (Modified implicit function theorem). Under hypotheses 2.1 (1-5), there is 
a neighborhood N of xq and a Frechet differentiable function (j) : N — > E m with <fi(xo) = yo 
and F((f>(x) 1 x) = Ofor all x E N. 

Proof. The proof applies the usual theorem, which requires that DiF(yo, xq) be one-to- 
one. In the present case the mapping is onto, so there are p vectors in R m that map to linearly 
independent vectors in W, and there are m — p additional vectors that complete a basis for 
W 71 . Let y = ?/ p ' + y( m_ P) be the decomposition of y € W 71 into the subspaces spanned 
by the respective sets of basis vectors. DiF(yo, xq) restricted to (K m )( p ) is one-to-one. This 
fact and hypotheses 2.1 (1-4) suffice to invoke the implicit function theorem for the function 
defined by F(y + y ( ™~ p) , x) on the domain V n [(M m )( p ) x E"]. There is a neighborhood 
N of xq in E™ on which there is a continuously differentiable function <j> : N —¥ (W 71 )^ 
such that 4>(xq) = y p and F(cj)(x) + y^ p , x) = 0. The implicit function in the lemma is 
given by <f>{x) + y Q m p ' . D 

Lemma 2.6 (Existence of H F (x) and properties). Under hypotheses 2.1 (1-5), there is 
a constant L > and a neighborhood Nx of xo where the function /J. F (x) of equation 
(2.1) exists, and n F (x) < L\\x — xq\\. Further, for any function f, 

J — x l l F ^ / =io ^F ■ 

Proof Hypotheses 2. 1 (1-5) suffice to invoke the version of the implicit function theorem 
in lemma 2.5: xq has a neighborhood N on which there is a continuously differentiable 
function : N -4 M m such that (cf>(x),x) is always a root of F. Thus the minimization 
problems for p, F (x) have feasible points for all x G N. The feasible sets are closed because 
F is continuous, so the minimal distance to yo is attained because the spaces have finite 
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dimension. This means /i F is well defined on N. Since <j> is continuously differentiable, it 
is Lipschitz continuous on compact sets. Choose a compact neighborhood Nx C A*" with 
Lipschitz constant L. Thus^ F (x) < ||0(x) — yoll = \\4>( x ) ~ 4>( x o)\\ < L\\ x ~ x o\\ for every 
x in the neighborhood. 

Given e > 0, let N(e) be the neighborhood in definition 2.3 for / =J o /j, F , If x G N(e)D 
Nx , then (1 — e)/i F (x) < /(x) < (1 + e)/x F (x) by the equivalence, so |/(x) — /x F (x)| < 
e/j, F (x) < e£||x — Xo|| and thus the limit in equation (2.2) vanishes. □ 

2.3. Linearized Problems with Equivalent Minimal Perturbations. It is instructive 
to compare the present situation with the implicit function theorem. Under hypotheses 2.1 
(1-4) and if DiF(yo, Xo) : K m — > K p is invertible, then some roots of F(y, x) — are given 
by a smooth parameterization (<fi(x), x). These roots can be located to first order in x — xq 
by considering the linearization, 

(2.4) F(<j>(x),x)=0 =»■ [i?iF(yo,xo)^0(xo) + i?2F(yo,xo)](x-xo) = O. 

The parameterized roots are approximated by, 

<£(#) - Vo ~ - [-Di-F(l/0) ^o)]" 1 [D 2 F(y ,x )] (x - x ) . 

In contrast, if DiF(yo, Xo) : K m — ► K. p is not invertible, the smallest change y — yo as a 
function of x can still be approximated from the linearizations FW of F in Table 2.1. 

Table 2.1 
Linearizations of the function F at (yo,xo). The notation is Ay = y — yo and Ax = x — xq. 



F(y,x) 



F W {y,x) = D 1 F{y ,x)Ay + F(y ,x) 
F (2) (y,x) = D 1 F{yo,xo)Ay + F(y ,x) 
F (3) (y,x) = D 1 F(yo,xo)Ay + D 2 F(y ,xo)Ax 



The different linearizations have different uses. For example, F^ and (2.1) do not 
require xo. The several approximations are treated in a progression of equivalences for F and 
F^\ then F^ 1 ' and F@\ and so on. The last F^ is the full linearization (2.4) of the implicit 
function theorem. The proof of asymptotic equality for F^ and F^ l+1 > is carried out with 
the dual mathematical programs. All the optimization problems are listed in Table 2.2, and 
the network of equivalences to be established is shown in Figure 2.1. 

If F satisfies hypotheses 2.1 (1-5), then all the functions of Table 2.1 satisfy the same 
hypotheses, so they also satisfy the conclusions of lemma 2.6. 

COROLLARY 2.7 (Existence of /^(x) and properties). Under hypotheses 2.1 (1-5) for 
F, for each function F^> of Table 2.1 there is a constant Li > and a neighborhood N Xo 
of Xq where the following distance function is well defined 

(2.5) »i(x)= min \\y - Voh 

y: FW(j/,x)=0 

and /Uj(x) < Li\\x — xo||. Further, for any function f, 

J = x Mi ^ / -io Mi • 



Proof The linearizations satisfy the same hypotheses as F, so lemma 2.6 applies. □ 
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Table 2.2 
Optimization problems parameterized by x and their duals. The values of problems (P), (Pi), (^2) are 
asymptotically equal at xq under hypotheses 2.1 (1-5). The value of(P 3 ) is differentially equivalent to the others 
under these hypotheses, and is asymptotically equal under hypotheses 2. 1 (1-6). See Table 7.1 for the problems in 
matrix notation. In these formulas, Ax = x — xq and Ay = y — j/o. 



name 


value 


function 


minimization form dual, maximization form 


(P) 


H F (x) 


F(y,x) 


min \\Ay\\ 

V ■ F(y,x) - 


(Pi) 


mi 0*0 


F m (y,x) 


min || Ay || max f(F(y ,x)) 
y : D 1 F(y ,x)Ay + F(y„,x) = / : ||-DiF(j/ , x)* f\\ < 1 


(ft) 


M^) 


F (2) (y,x) 


min ||Aj/|| max f(F(y ,x)) 

y : D 1 F(y ,x )Ay + F(y„,x) =0 / : \\Dj_F (j/ , x )*/ll < 1 


(Ps) 


1^3 (x) 


F( 3 \y,x) 


min ||Aj/|| max f(D 2 F(y ,x )Ax) 

y : DF(y ,x )(Ay, Ax) = / : \\D 1 F(y , x )' f\\ < 1 



CP), 



Thm. 3.8 



asymptotically equal 



Thm. 4.4 

(-Pi ) min < > (Pl)i 

duality equality 

Thm. 5.1 



(ft). 



Thm. 4.4 



asymptotically equal 



duality equality 



m 



^ 1-* 2 J max 



Thm. 6.2 



Thm. 4.4 
(P 3 )min^= > (P 3 



differentially equivalent 
or asymptotically equal 
depending on hypotheses 



duality equality 



FIG . 2.1. Where and how the equivalences of Table 2.2 are proved. 



3. First Equivalence, (P) m ; n = (Pi) m ; n . The preparations to establish the first equiv- 
alence are the most elaborate in this paper. Several aspects of the difference between F and 
the tangent function for y, F^\ are uniform in x — xq\ mean values, Frechet differentials, 
and level sets. The first equivalence thus requires giving a uniform parameterization to many 
basic concepts in real analysis, which are indicated in Figure 3.1. The mean value theorem 
and Frechet quotient are discussed in section 3.1, the matrix lower bound is in section 3.2, 
level sets are in section 3.3, and finally the proof of the first equivalence is in section 3.4. 



uniformly parameterized 

mean value theorem, 

Lemma 3.1 



Lipschitz 

continuity of fi F , 

Lemma 2.6 



Lipschitz 

continuity of fj, 1 , 

Corollary 2.7 



uniformly approximating 

Frechet differential, 

Corollary 3.2 



uniformly colocated 
level sets, 
Lemma 3.7 



matrix lower bound, 
Definition 3.3 



uniformly bounded below 

partial derivatives, 

Lemma 3.6 



l fa equivalence, 
Theorem 3.8 



FIG. 3.1. Dependencies for the proof of the first equivalence. 



3.1. Uniformly Parameterized Mean Value Theorem. It is well known that if / is 
continuously differentiable, then for every y% and every e > there is a neighborhood N y3 (e) 
of y$ where 



(3.1) yi,y 2 eN y3 (e) 



||/(yi) - /(Sfc) - Df(y s )(yi - y 2 )|| < e \\ yi - ^11 



This serves as a mean value theorem in multiple dimensions. Luenberger [4, p. 212] remarks 
that it has been discussed many times. Bartle [1, p. 377] calls (3.1) the "key lemma" for 
theorems like the implicit function theorem. Ortega and Rheinboldt [5, p. 72] show that (3.1) 
is equivalent to the continuity of the derivative. Here, this surrogate mean value theorem is 
generalized to parameterized functions. 

Lemma 3.1 (Uniformly parameterized mean value theorem). Under hypotheses 2.1 (1— 



5), for every e > there is a neighborhood N Va 
2/1.3/2,2/3 € N^; 1 ' (e) and x E N^ 1 '(e), 



(3.1) 



(e) x N Xo ; (e) C V such that for all 



(3.2) 



\\F(y 1 ,x)-F(y 2 ,x)-D 1 F(y 3 ,x)(y 1 -y2)\\<€\\yi-y 2 \\. 



Proof. The topology of the product space M. m x W 1 can be generated from the prod- 
ucts of the open sets, so it is possible to choose a compact, convex neighborhood Y around 
j/o, and a compact neighborhood X around xq, so that Y x X C T>. All norms for a 
finite dimensional space generate the same topology, so without loss of generality let the 
norm for K m xl™ x W 1 be max{||j/i||, H2/2II, \\x\\}- Since D\F{y,x) is continuous, hence 
5(2/1, 2/2, x) = DiF(yi,x) — D\F(y2,x) is uniformly continuous on the compact set K = 
Y x Y x X. The uniform continuity means, for every e > there is a 6(e) > so that 
if (2/1,2/2,2;), (2/1,2/2. a;') e K with max{||yi - y' x \\, \\y 2 - y 2 ||, \\x - x'\\} < (5(e), then 
||0(2/i, 2/2, ar) -«7(2/i,2/2,a;')ll < e - 

Choose the neighborhoods in the statement of the lemma to be Ny ' (e) = -B^o (5(e) ) fl 
F and Nx (e) = i?a; (5(e)) fl X. Note, these sets are convex. If j/i, 3/2, 2/3 an d x a 16 from 



the respective sets, then 
||DiF(tyi + (1 - i)2/2, a;) - DiF(» 3 , »)|| 



l|p(*2/i + (1 -*)2/2,2/3,a;)|| 

||flf(tj/i + (1 - O2/2, 2/3, ar) - 5(2/0, 2/0, a; ) 



< e. 
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It is well known from [1, p. 376, lemma 41.3] or from [5, p. 70, lemma 3.2.5] that if D C 
M m is a convex, open set, and if / : D — > MP is continuously differentiable, then for any 

2/1,2/2,2/3 G D, 

Wf(yi)-f(y2)-Df(y 3 )( yi -y 2 )\\ < sup \\Df(ty 1 + (l-t)y 2 )-Df(y 3 )\\\\y 1 -y 2 \\. 

o < t < 1 

Applying this inequality to the parameterized function F(y, x) for the previously chosen j/i, 
2/2, 2/3, x gives 

\\F(y u x) - F(y 2 ,x) - D 1 F{y^x)(y 1 - y 2 )\\ 

< sup \\D 1 F(tyi + {l-t)y2,x)-D 1 F(y 3 ,x)\\\\yi-y2\\ 
o < t < 1 

< e||2/i -2/211- 

D 

Lemma 3.1 gives conditions under which the Frechet differential for y is uniformly ap- 
proximating with respect to the parameter x. 

COROLLARY 3.2 (Uniformly approximating differential). The neighborhoods of lemma 
3.1 also satisfy, for all y G Ny (e) and x G N Xo (e), 

(3.3) \\F(y,x)-F^(y,x)\\<e\\y-y \\, 

where F 1 - 1 ' (y, x) is the parameterized tangent function of Table 2.1. 

Proof. Choose y\ = y, y 2 = 2/0 an d 2/3 = 2/o so tnat tne formula in equation (3.2) 
becomes 

F{yi,x) - F{y 2 ,x) - D 1 F(y 3 , x)(y 1 - y 2 ) = F(y,x) - F(y ,x) - D 1 F(y ,x)(y - 2/0) 

= F{y,x)-F^\y,x). 

D 

3.2. Matrix Lower Bound. The matrix lower bound, || A\\i, is analogous to the matrix 
norm but with reversed inqualities. The following are from [3, p. 205, def. 2.1 and lem. 2.2; 
p. 212, cor. 4.3]. 

DEFINITION 3.3 (Matrix lower bound). Let Abe a nonzero matrix. The matrix lower 
bound, \\A\\i, is the largest of the numbers, m, such that for every y in the column space of 
A, there is some x with Ax = y and m \\x\\ < \\y\\. 

LEMMA 3.4. The matrix lower bound exists and is positive for every nonzero matrix. 

LEMMA 3.5. The matrix lower bound is continuous on the open set of full rank matrices. 

The present use of the lower bound is in the following lemma. 

Lemma 3.6 (Uniform lower bounds for partial derivatives). Under hypotheses 2.1 (1- 
5), there is a neighborhood Nx of xq where DiF(yo,x) : M. m — > MP is onto for every 
x G N Xo . There is also a number to' 3,6 ' > such that every x G Nx ' and u G M p have 
some w £M m (which depends on x and u) so that DiF(yo,x)w — uandm^ 3 ' 6 '\\w\\ < ||u||. 

Proof. Choose some bases for R m and R p so that these spaces are represented by real 
column vectors. The linear transformations DiF(yo, x) are then represented by p x to ma- 
trices, A(x). By Hypothesis 2.1 (3) F is continuously differentiable and (5) DiF(yo,xo) 
is onto, which mean A(x) is a continuous function of x and the column space of A(xo) is 
all of R p , or equivalently A(xq) has full row rank. For a matrix M to have full row rank 
means det (MM 1 ) does not vanish. The determinant is a continuous function of the matrix, 
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so A(xq) has a neighborhood of matrices Nm Xq \ all of which have full row rank. From the 
continuity of A(x), there is a neighborhood N xo for which all matrices lie in Nm Xi a. Hence 
for all x G N X(I the mappings DiF(yo, x) are onto, or equivalently the column space of each 
matrix A(x) is all of RP. 

Choose a compact neighborhood N xo C iV Xo . Since ||A(x)||^ is continuous and posi- 
tive on N Xo by lemmas 3.4 and 3.5, ||^4(x) \\e is uniformly bounded below on N xa by some 
m (3-6) > o. If x £ 7vio' 6) and u G W, then since the column space of A(x) is all of W, by 
definition 3.3 there is w G R m so D\F(yo,x)w = A(x)w = u and ||A(x)||^ ||u>|| < ||w||. 
Further, m^ 36 ) < ||A(x)|| £ by the choice of 7V|o' 6) - □ 

3.3. Uniformly Colocated Level Sets. Suppose D is an open set in IR m , on which 
/ : D -4 M p is continuously differentiable. By analogy with real-valued functions, the 
set / _1 (a) may be called a level set of /. It is possible to make a geometric comparison 
between the level sets of / and those of its tangent function at yo. For functions such as 
F that vary smoothly with a parameter, the distance between the corresponding level sets is 
uniformly bounded with respect to changes in the parameter. The proof is a modification of a 
construction apparently due to L. M. Graves [2], see also [1, p. 378, theorem 41.6]. 

Lemma 3.7 (Uniformly colocated level sets). Under hypotheses 2.1 (1-5), for every 
e > there is a radius r(e) > and a neighborhood N Xa '(e) of xq so cl(B ya (r(e))) x 
iVio' 7) (e) C V. Foreachpair (y,x) G B yo (r(e)/(l + e)) x tf§ -7) (e): 



(1) 

(2) 



(a) there exists 



Vi £B yo (r(e)) 
y F Gd(B tf0 (r(e))) 



(b) with 



FW( yi ,x)=F(y,x) 
F(y F ,x)=FW(y,x) 



(c) and with 



||l/i-»||<c||»-W)|| 

\\y F -y\\ < e \\y-yo\ 



Proof Lemma 3.7 has the first of the two most complicated proofs in this paper. Let 
6 = e/(l + e) < 1. Let ttt/ 3 - 6 ) be the lower bound for the neighborhood N xo in lemma 3.6. 
Choose a radius r(e) > so that 



(3.4) 



cl(B yo (r(^))^N^(Sm^) 



'VoV \")l) - -y a 

The neighborhoods from which the lemma is allowed to choose y and x are 

(3.5) y G B, (r(e)/(1 + e)) C cl( J B, (r(e))) C N^(5m^) , 

(3.6) x G iVi 3 ' 7 )(e) := A^ 3 - 6 ' n N^tf m^) C JV^- 1 ^™* 3 - 6 )) . 

Note the product B ya (r(e)) x N x 3 7) (e) is a subset of A^ (<* m (3 ' 6) ) x /vi^^m^ 3 ' 6 )) in 
2? by lemma 3.1. 

(Part 1.) Because DiF(yo,x) : R m — > M p is onto, the range of the transformation 
contains the vector F(y, x) — F^'(y, x), and because x G N Xo ' by (3.6), lemma 3.6 finds 
a y with 

(3.7) D 1 F(y ,x)y = F(y,x) - F^\y,x) , 

(3.8) and m (3 ' 6) \\y\\ < \\F(y,x) - F (1) (y,x)|| . 

Let yi = y + y so y = yi — y. The equality (3.7) and some algebra imply 

F (1) (yi,x) =D 1 F(y ,x)(yi -y a ) + F(y ,x) by definition of F^ in Table 2.1 

= [DiF(y ,x)(y 1 - y)] + [DiF(y , x)(y - y ) + F(y , x)] inserting ±y 

= [F(y, x) - F (1) (y, x)] + F {1) (y, x) by (3.7) and by definition of F W 
= F(y,x) 
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which 


is part (lb). Further, 








\\y v\\"^ F{v,X) ~ 


■F( 


l) (y,x)\\ 




112/1 y\\ - 

m 


(3.6) 






^ (Sm^) \\y- 

m (3.6) 


-2/o || 




= <5||2/-2/o|| 








< e||y-yo|| 




by the ch 


which 


is part (lc). Finally, 







from (3.8) 
by (3.3) and ye N$ A) (6 m^) in (3.5) 



1 1 j/i — 3/0 II < II2/1 -y\\ + \\y-yo\\ 

< e 1 1 2/ — 2/o 1 1 + 1 1 2/ — 2/o 1 1 from(lc) 

= (l + e)||y-|ft,|| 

< r(e) from the choice y G Sj, (r(e)/(l + e)) in (3.5). 

Therefore y\ € B yo (r(e)), which is part (la). 

(Part 2.) Let y\= y from (3.5). This j/i and yo begin a sequence {y n } to be built subject 
to the conditions: 

(In) hn+i -y n \\ < S n ||2/-2/o|| , 

(2„) ||F(y„ +1 ,.T) - FW^xJII < (im< 3 - 6 >) ||j/ n+1 -y„|| . 

Condition (lo) is just \\y — 2/0 1 1 < 1 1 2/ — 2/0 1 1 - Condition (2q) is (3.3) in corollary 3.2 which is 
applicable by the choices of y\ = y and a; in equations (3.5) and (3.6). 

Suppose yo, 2/1, • • • , 2/fc nave been constructed to satisfy (1„) and (2„) for < n < 
fc — 1. The selection of j/fc+i proceeds as for y\ in the first half of the proof. Again because 
DiF(yo, x) : R m — ¥ K p is onto, the transformation maps to — [F(y k , x) — F^(y, x)], and 
because x £ Nx ' by (3.6), it is possible to invoke lemma 3.6 to find a y with 

(3.9) D 1 F(y ,x)v = -[F(y k ,x)-FW(y,x)] , 

(3.10) and m( 3 - 6 )||y||<||F(y fe ,aO-F (1) (2/,a;)ll- 

Let 3/ fe+ i = y + y k so y = y k+1 - y k . For this choice of y k+1 , 

\\F(y k ,x)-F( 1 Hy,x)\\ 
hk+i -Vk\\< (3^) from (3.10) 

< (Sm^)\\y k -y k ^\\ 

^ —(3l) from (2^) 

= S\\y k -Vk-i\\ 
<5 fe ||2/-2/o|| from(l fe ), 

which is (lfe). Summing (1„) for < n < k gives 

1 - 5 k+l 



12/fe+i 



2/o|| < Yj 1^"+! _ W»H - !_j H y - 2/o|| < (1 + e) ||i/ — 2/0 1 1 , 



which easily follows from the choice 6 = e/(l + e). This inequality combines with y G 
B vo(r(e)/0- + <0) t0 P lace 2/fc+i e B OT (r(e)) C JV^^Jm' 3 ' 6 ') from equation (3.4), and 
then (j/fc+i ,x)<EV. Thus, the evaluation of F(y k+ \ , x) is well defined. Further, 

||F(y fe+1 , a; )-F( 1 )( 2/ , a; )|| 

= H-Fd/fc+i.x) - P(2/fc,z) - {- [J'Gte.aO - P (1) (2/,z)]} II inserting ±F(y k ,x) 
= \\F{y k+1 ,x) - F{y k ,x) - D 1 F(y a ,x)(y k+1 - y k )\\ from (3.9) 
<(5m (3 -V)\\y k+1 -y k \\ from (3.2), 

which is (2fe), 

In this way a sequence {y n } C B yo (r(e)) is constructed that satisfies conditions (1„) 
and (2„) for all n. The sequence is a Cauchy sequence by (1„), so it has a limit yp G 
cl(_Bj, (r(e))), which is part (2a). Passing to the limit in (2„) shows F(yF,x) — F^ 1 \y,x), 
which is part (2b). Summing (l n ), now for 1 < n < k, gives 

k 1 — 5 k 

hk+i -y\\ = \\yk+i — 2/1 II < ^2 H y " +1 -y "H - s i-s W y ~ y °W ' 

which in the limit becomes (2c), \\yp — y\\ < (5(1 — (5) _1 ||y — y || = e||y — yoll- D 

3.4. Proof of the First Equivalence. 

THEOREM 3.8 ((P) m in = (-Pi) mm). Under hypotheses 2.1 (1-5), there is a neigh- 
borhood of xq where both optimization problems (P) m ; n and (P\) m in of Table 2.2 are well 
defined. Their values are asymptotically equal at xq in the sense of definition 2.3. 

Proof. By lemma 2.6, xo has a neighborhood N Xo where problem (P) m i n is well 
defined for every x G N Xo , and the optimal value, /j, F (x), is Lipschitz continuous at Xq 
with constant L. 

By corollary 2.7 similarly, xq has a neighborhood N Xo ' where problem (Pi) m i n is 
well defined for every x G N XQ , and the optimal value, /U 1 (x), is Lipschitz continuous at 
xq with constant L\. 

Let By (r(e) / (1 + e)) x JVs ' (e) be the neighborhood of (2/0,20) in lemma 3.7, and let 

iV(e) = iV(f) n *£"> n AT(f )(e) n B X0 (min{ir\ V}^) • 

Note the ball in this formula is around xo rather than j/o- 

Suppose x G N(e). Let H F (x) be attained at y. By lemma 2.6 and x G B X() (L~ 1 r(e) 
/(l + e)), therefore 

III/ - I/o || = M*) < L l^ - soil < r(e)/(l + e) , 

which places (y, a;) G -B !/0 (r(e)/(l + e)) x N Xa '(e). Part 1 of lemma 3.7 now asserts there 
is a yi G B ya (r(e)) with 

||2/i — 2/|| < e ||2/ — 2/0II and F^(y u x) = F(y,x) = . 
Thus 

Mi(«) < II2/1 - 2/o|| < II2/1 - 2/ II + lly - 2/0 1| < (1 + e) \\y - 2/0 1 1 = (1 + e) MfOO 

which is the upper side of (2.3) in definition 2.3. The inequality with p, F and /j, 1 exchanged is 
established by the same argument using L\ instead of L, corollary 2.7 instead of lemma 2.6, 
and lemma 3.7 part 2 instead of part 1. The two upper-side inequalities imply (2.3). D 



4. Equalities for the Dual Problems. The duality theory for best linear approximation 
guarantees that the three pairs of dual problems in Table 2.2 have equal values. Equalities 
like these are well known and can be established in many ways. These are derived from 
the following duality theorem that Luenberger [4, p. 119, thm. 1] proves directly from the 
Hahn-Banach theorem. 

THEOREM 4. 1 (Best linear approximation). IfS is a subspace and j/o is an element of a 
real, normed linear space, then 

inf 1 1 2/ — 2/o 1 1 = max f(y ) . 

yes f e s 1 , ||/|| < l 

COROLLARY 4.2 (Best affine approximation). If A is an affine subspace and yo is an 
element of a real, normed linear space, then 

inf 1 1 J/ — 2/0 1 1 = max f(y - a) 

ye A f e (A -a)-*-, ll/H < 1 

in which a is any element of A. 

Proof. Replace y, y , S in theorem 4. 1 by y — a, y — a, A — a. □ 

COROLLARY 4.3. Let T : R m ->• R p be a linear transformation. For every y G R m , 

each optimization problem below is well defined if and only if ' h G T(R m ), in which case the 

optimal values are equal. 

min 1 1 2/ — 2/0 1 1 = max g(Ty - h) 

y£R m :Ty = h g e (!")* : \\T* g\\ < 1 

Proof The minimization is well-posed whenever h is in the image of T. The same can 
be proved for the maximization. If h G T{R m ), then h = Tu for some u, so the objective 
function, 

g(Ty -h) = gT( yo - u) = (T*g)(y - u) < \\T*g\\ \\y - u\\ < \\y - u\\ , 

is bounded above for every g G (M n )*. The maximum is attained because the feasible set is 
closed in a finite dimensional space. 

Conversely, suppose the maximization is well posed. If g G T^M™)- 1 = kcr(T*), then g 
and all its multiples are feasible. Hence g(h) = 0, lest by scaling g it would be possible to 
make g{Ty - h) = g(h) arbitrarily large. Thus h G ^[7(11™)^] = T(R m ). 

All that remains is to establish the equality using corollary 4.2. Choose A — {y G M m : 
Ty = h] and a e A. Now A - a = kcr(T), so 

{A - a) 1 - = [kcriT)} 1 - = ^(T^R")*)} 1 - = T*(R n )* . 

This means/ G (.4— a) 1 - if and only if/ = T*gforsomeg G (R™)*. Thus the maximization 
in corollary 4.2 is over all such g with ||T*g|| = ||/|| < 1. Finally, the objective function is 

/(yo - a) = (T*g)(yo - a) = gT(y - a) = g(Ty - Ta) = g(Ty Q - h). 

D 

THEOREM 4.4 ((Pi) min = (P t ) max , « = 1,2,3). Under hypotheses 2.1 (1-5), there is a 
neighborhood of xq where problems (P\ ) m j n and (P\ ) max of Table 2.2 are well defined and 
their values are equal, and similarly for the (P 2 ) and (P 3 ) pairs of dual problems. 

Proof. By lemma 3.6, D\F(yo, x) is onto for every x G A^; ' • Therefore by corollary 
4.3 the following problems are well defined and their values are equal for every h G W . 

min \\y-yo\\ = max f(D 1 F(y ,x)y -h) 

y : D 1 F(y ,x)y -h = f : \\D 1 F(y a , x)* f\\ < 1 
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Choosing h = D 1 F(y ,x)y n — F(y ,x) gives the conclusion of the theorem for the (P x ) 
dual problems. 

In particular DiF(yo,xo) is onto, so also by corollary 4.3 the following problems are 
well defined and their optimal values are equal for every h G M. p . 

min ll2/-2/o|| = max f(D 1 F(y , x )y - h) 

y : D 1 F(y ,x (t )y-h = f : \\D l F(y a ,x )* f\\ < 1 

The choice h = D\F(jjq, xo)yo — F(yo, x) gives the conclusion for the (P 2 ) dual problems; 
similarly h = DiF(yo, xo)yo — D2F(yo,xo)(x — xq) for the (P 3 ) problems. □ 

5. Second Equivalence, (Pi) max = (P2)max- The second equivalence to be proved, 
in the notation of Table 2.2) says that the feasible set {/ : |Z?iP(yo,20*/ll < 1} can be 
replaced by one that is independent of x. The proof is self-contained and is the second of the 
two most complicated proofs in this paper. 

THEOREM 5.1 ((Pi) max = (P2)max)- Under hypotheses 2.1 (1-5), there is a neigh- 
borhood of x where both optimization problems (Pi) max and (P2) max of Table 2.2 are well 
defined. Their values are asymptotically equal at Xq in the sense of definition 2.3. 

Proof. The hypotheses suffice to invoke theorem 4.4 which says (Pi) m ax and (P2) max 
are well defined on some neighborhood iV^ of xq. The feasible sets are given by 

C(x) = {/ e (r)* : HDiFfob.aO'/ll < 1} 

MiWmax = max f(F(y ,x)) ^(^Omax = max f(F(y 0l x)) 
f e C(x) f e c(x ) 

The proof has three steps that culminate in equations (5.2), (5.3) and (5.4), respectively. 
(Step 1.) If A Gbd(C(x )) = {/: ||PiP(2/ ^o)*/ll = l},then 

I WDiFfaxYhW - 1| = \\\D 1 F(y ,xyf 1 \\ - ||£>iP(y ,zo)7i||| 

<||PiP(yo,x)*/i-PiP(2/o,x )*/i|| 

(5.1) < ||PiP(yo,x)* - D 1 F(y ,x Q r\\ ||/i|| 

= ||PiP(y ,x)-PiP( 2 ;o,xo)||||/i|| 

< \\D 1 F(y ,x)-D 1 F(y ,x )\\ max ||/|| . 

/ e bd(C(x )) 

The linear transformation DiF(yo,xo) is onto, so its adjoint DiF(yo,xo)~* is one-to-one. 
Hence \\DiF(yo,x)*f\\ defines a norm on the dual space whose closed unit ball is C(xq). 
Thus, in the last bound of equation (5.1), the maximum is finite because C(xq) is compact. 
There also, the difference term converges to as x — > xq because F is continuously differ- 
entiable. Altogether, ||DiF(yo>3;)*/i|| converges to 1 uniformly on bd(C(xo)) as x — >• xq. 
This means, for every e > 0, there is a neighborhood N^(e) of xq, such that 

(5.2) ze7V( 2 )(e)and/iGbd(C(a;o)) => 1 - e < WDiFfaxffiW < 1 + e 

(Step 2.) Choose x G iV^ n A^^ 2 ' (e), and then choose any nonzero / e C(x), and finally 
let /1 = //||PiP(j/0)2 ; o)*/|| Gbd(C(xo)). Assume without loss of generality that e < 1, It 
is now possible to calculate 

||(1 - e)D 1 F(y ,x rf\\ = (1 - e) H^P^o,^)*/!! 

< ||£>iP(y o ,a07i|| \\DiF(yo,x )*f\\ from (5.2) 
= \\D 1 F(y ,x)*f\\ 

< 1 because / G C(x). 
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This proves (1 — e)/ G C(x ). Similarly, choose any nonzero / 2 G C(x ) and let f\ = 
/o/||-Di-F(yo,a;o)*/o|| G bd(C(a;o)). It now follows that 

||(l + e)- 1 ^ 1 F(y ,x)*/o|| = (l + e)" 1 pi^(yo,a:)*/o|| 

= (l + e)- 1 \\D 1 F(y ,xyf 1 \\\\D 1 F(y ,x o yf \\ 

< \\DiF(y , xo)*fo\\ from equation (5.2) 

< 1 because /o G C(xq). 

This proves (1 + e) _1 /o G C(x). These two calculations establish the next implication. 

(5.3) xe N^nN^(e) =*> (l-e)C(x) C C(s ) C (l + e)C(cc) 

(Step 3.) Choose x £ N^ D 7V (2) (e), and then choose f x G C(x) that attains /Uj (z) max . 
Equation (5.3) asserts (1 — e)/i G C(#o), so 

M2( a: )ma X = max f(F(y ,x)) > (1 - e)fi(F(y ,x)) = (1 - e) Mi(«)max • 

/ 6 C(x ) 

Similarly, choose / 2 G C(xq) that attains n 2 {x) max . Now equation (5.3) asserts (l+e) _1 /2 G 
C(#), so 

Mi( a; )ma X = max f(F(y ,x)) > (I + e^ 1 f 2 (F(y 0l x)) = (1 + e)~ 1 p 2 (x) mSLK . 
f 6 C(x) 

Together these two inequalities provide the final implication, 

(5.4) x G AT (1) n N^(e) => (1 - e) Mi(aOnu* < M 2 (^)ma X < (1 + e)Mi(x) max , 
which is (2.3) in definition 2.3. D 

6. Third Equivalence, (P 2 ) max = (P3) maX ' The proof of the last equivalence involves 
a class of norms that has been used already in the proof of theorem 5.1. If a linear mapping 
T : M m — > M p is onto, then its adjoint T* is one-to-one, so ||T*/|| defines a norm on the 
dual space, (R p )*. The dual of this norm, viewed as a norm on MP, is given by the following 
construction. All the maximization problems in Table 2.2 are norms of this kind. 

LEMMA 6.1. If a linear transformation T : R m — > W is onto, then 

\\v\\ T := max f(v), 

f ■■ \\T*f\\ < 1 

is a norm on MP. (The proof is clear.) 

THEOREM 6.2 ((P 2 ) max = (P3) max). Under hypotheses 2.1 (1-5), there is a neighbor- 
hood ofxo where both of optimization problems (P 2 ) max and (P3) max of Table 2.2 are well 
defined. Their values are differentially equivalent at Xq in the sense of definition 2.1. 

Under hypotheses 2.1 (1-6), the values of the problems (P 2 ) m ax and (p3)max are 
asymptotically equal at Xq in the sense of definition 2.3. 

Proof. (Part 1.) Let || • ||t be the norm given in lemma 6.1 for the linear transformation 
T = D 1 F(y , xq). Let T(y, x) = D 2 F(y, x )(x — x ) + F(y, x ) be the linear function 
parameterized by y whose graph is tangent to the graph of F(y, x) at x = x . (Note this 
is not the pW of Table 2.1.) In this notation, /u 2 (a;) max = ||P(2/0)2;)||t and /Lt 3 (a;) max = 
\\T(yo, x)\\t- Thus by the triangle inequality, 

iM^Omax -M 3 ( a; )max| = | \\F(y ,x)\\ T - \\T(y , x)\\ T \ < \\F(y ,x) - T{y , x) \\ T ■ 

12 



The difference between F(y ,x) and T(yo,x) is o(||x — x ||) uniformly in x by the definition 
of Frechet differentiability. The same estimate applies in the || • ||y norm because all norms 
are equivalent in finite dimensional spaces. Therefore 

,. |M 2 ( a; )ma X -M3( a: )ma X | || F(y , X) - T(y , x) \\ T „ 

hm < lim J: = 0, 

x^x \\x-Xo\\ x^x ||x-Xo|| 

which is (2.2) in definition 2.1. 

(Part 2.) The Frechet differentiability of F with F(yo, X)0) =0 imply 

F(y ,x) = D 2 F{y 0l x )(Ax) + R(Ax) 

where Ax = x — x and the remainder R(Ax) is o(\\ Ax\\). Again by the triangle inequality, 

\\F(y ,x)\\ T = \\D 2 F(yo,Xo)(Ax) + R(Ax)\\ T > \\D 2 F(y ,xo)(Ax)\\ T + \\R(Ax)\\ T ■ 

where as noted ^ 2 (x) max = ||F (y a ,x)\\ T and /x 3 (x) max = ||£>2.F(yo,xo)(Aa;)||T. The 
latter is a norm for Ax under the present hypothesis that D 2 F(y , Xq) is one-to-one. Thus, if 
x / x , then the inequalities can be divided by /z 3 (x) to give, 



feWi 



Li 3 (x) r 



< \\R(Ax) 



\D 2 F{y 0l x )(Ax) 



Again by the equivalence of all norms for a finite dimensional space, the upper bound van- 
ishes in the limit x — >• xq because R(Ax) is o(||Ax||). The vanishing limit implies (2.3) in 
definition 2.3. D 

7. Summary in Matrix Notation and for 2-Norms. Suppose bases have been chosen 
for R m , W 1 , W and a norm has been chosen to measure perturbations in M. m . These choices 
express the optimization problems in matrix notation: 

1. J(x) is the pxn Jacobian matrix for D 2 F(yo, x). The entries are the partial deriva- 
tives of F(y, x) with respect to x evaluated at (yo, x). 

2. K(x) is the p x m Jacobian matrix for DiF(yo, x). Entries are partial derivatives 
of F(y, x) with respect to y evaluated at (yo, x). 

3. The residual vector of the equations is r(x) = F(yo, x) G M p . 

4. || • || is the chosen norm for R m , and || • ||* is the dual norm. 

The matrix versions of the problems are in Table 7.1. If K(xo) has full row rank, then this 
paper has shown: 

1. The minimizations and maximizations of Table 7.1 are duals (theorem 4.4). 

2. The optimal values /u F (x), Mi( x )> ^ 2 ( x ) ^^ asymptotically equal at xo (theorems 
3.8 and 5.1). 

3. /i 3 (x) is differentially equivalent to the other values (theorem 6.2). 

That is, the values /Uj(x) approximate H F (x) increasingly well as x nears xq. For 2-norms, 
the approximations can be found very simply using the matrix QR factorization. 

LEMMA 7.1. Let A e W nxp and s,u e M p . If A has full column rank, then for the 
A = QR factorization, 

max u s = || R~ s \\ 2 . 

\\Au\\ 2 < i 

Proof. Because u T s — u T R T R~ T s and ||-Ru||2 = ll^lb = 1 therefore u T s < 

||i? _T s|| 2 with equality when u = R~ 1 R~ T s/\\R~ T s\\ 2 . D 
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Table 7.1 
Optimization problems of Table 2.2 in matrix notation. In these formulas, Ax = x — xq. 



name 


value 


minimization form 


dual maximization 


(P) 


H F (x) 


rnin 
F{y,x) =0 


\\y-yo\\ 




(Pi) 


A«i («) 


rnin 

K(x)Ay= - 


l|Ay|| 

-r(x) 


max u r{x) 
||K(a;) T u||* < 1 


(P*) 


M^) 


rnin 

K.(x„) Ay = 


l|Ay|| 

- r(x) 


max u r(x) 
\\K(x ) T u\\* < 1 


(Ps) 


te{x) 


min 

K.(x a )Ay = 


l|A»|| 

— J(xq) Ax 


max u J(xo) Ax 

\\-K(x„) T u\\* < 1 



Appendix A. Nomenclature and Notation. This appendix lists some standard notation 
that is used without comment throughout the paper. 

1. For / : T> C R m x M" -)• E^, the Frechet derivative of / evaluated at (y,x) is 
Df(y,x) e hom(R m+n ,R p ). The partial Frechet derivative of / with respect 
to the first space R m and evaluated at (y,x) is D\f(y,x) £ hom(IR™ l ,R p ), and 
similarly for the second space R" and D2/. 

2. The dual space of a normed linear space R m is the space of functionals (R m )* = 
hom(R m , R) with the induced norm. The annihilator of a set S C R m is the sub- 
space S 1 - C (R m )*. The subspace annihilated by a set S C (R m )* is - L 5 C R m . 
The transpose of T e hom(R m ,RP) is T* e hom((RP)*, (R m )*). 

3. The interior, boundary, and closure of a set S are indicated by int (S), bd (S), and 
cl(5). 

4. The open ball with center c and radius r is B c {r). 

5. Six lemmas assert the existence of neighborhoods that are indicated by placing the 
lemma number in a superscript, the point around which the neighborhood lies in a 
subscript, and any parameterization of the neighborhood in parentheses: 

^' 6) N^ Ng%) N^(e) Ng*> N^(e). 
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